Ibm Model-based and Frame-by-frame Speaker-recognition

نویسندگان

  • Homayoon S. M. Beigi
  • Upendra V. Chaudhari
چکیده

Amidst the recent crazes for emerging technologies like speech recognition and biometrics, speaker recognition is slowly reaching the maturity to be deemed practical in many diierent applications. This paper presents new approaches for text-independent speaker recognition. The performances of the model-based algorithm presented concurrently at the ICASSP'98 conference and the frame-based algorithm presented in this paper are compared here. The engine, described here, provides multiple func-tionalities including those of identiication, veriication and classiication. The modes of operation and design choices allow for tight integration of the speech recognition and speaker recognition engines in a broad sense. This new architecture as well as the results obtained for very speciic tasks undoubtedly announce myriads of new applications where both technologies complement each other and can no longer be clearly distinguished as illustrated by the concept of speech biomet-rics. Hands-free and eyes-free human/machine transactions are moving a step further toward easier and more eecient interfaces and speech transactions are becoming more ubiquitous. nouvelle architecture de m^ eme que les r esultats obtenus pour des t^ aches bien sp eciiques annoncent une myri-ade de nouvelles applications o u ces deux technologies se compl ementent au point de ne plus ^ etre distingu es. Les interactions homme/machines en mode \mains li-bres et yeux libres" font un grand pas dans la directions d'interfaces plus ais es et plus eecaces. La parole de-vient davantage le vecteur pr eferr e de d'interaction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The IBM 2016 Speaker Recognition System

In this paper we describe the recent advancements made in the IBM i-vector speaker recognition system for conversational speech. In particular, we identify key techniques that contribute to significant improvements in performance of our system, and quantify their contributions. The techniques include: 1) a nearest-neighbor discriminant analysis (NDA) approach that is formulated to alleviate som...

متن کامل

Text-independent speaker recognition using non-linear frame likelihood transformation

When the reference speakers are represented by Gaussian mixture model (GMM), the conventional approach is to accumulate the frame likelihoods over the whole test utterance and compare the results as in speaker identi®cation or apply a threshold as in speaker veri®cation. In this paper we describe a method, where frame likelihoods are transformed into new scores according to some non-linear func...

متن کامل

In-set/out-of-set speaker identification based on discriminative speech frame selection

In this paper, we propose a novel discriminative speech frame selection (DSFS) scheme for the problem of in-set/out-of-set speaker identification, which seeks to decrease the similarity between speaker models and background model (or antispeaker model), and increase the accuracy of speaker identification. The working scheme of DSFS consists of two steps: speech frame analysis and discriminative...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Direct Modeling of Spoken Passwords for Text-dependent Speaker Recognition by Compressed Time-feature Representations

Traditional Text-Dependent Speaker Recognition (TDSR) systems model the user-specific spoken passwords with frame-based features such as MFCC and use DTW or HMM type classifiers to handle the variable length of the feature vector sequence. In this paper, we explore a direct modeling of the entire spoken password by a fixed-dimension vector called Compressed Feature Dynamics or CFD. Instead of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998